ROCm(Radeon Open Compute) 생태계는 오픈소스 하드웨어와 고성능 컴퓨팅을 연결하기 위해 설계된 모듈식이고 계층적인 소프트웨어 스택입니다. 이는 단일 드라이버가 아니라 파이프라인 현실—안정적이고 재현 가능한 환경을 보장하는 배포 단계들의 연속입니다.
1. 모듈식 스택 계층 구조
ROCm 구성 요소들은 정교한 스케일링을 가능하게 하기 위해 분리되어 있습니다. 스택은 AMDGPU 커널 드라이버 부터 시작하여 ROCT(섬유), ROCR(런타임)까지 이르며, 마침내 HIP API 수학 라이브러리로 향합니다. 이러한 아키텍처는 체계적인 온보딩 워크플로우를 필요로 합니다.
2. 배포의 생명주기
플랫폼의 현실은 엄격한 종속성 체인을 규정합니다: 커널 버전을 지원 매트릭스에 맞추고, GPG 서명된 저장소를 초기화하며, 네이티브 패키지 매니저를 통해 종속성을 해결하고, PATH 그리고 render 렌더 그룹을 구성하여 하드웨어 표면을 HIP에 노출해야 합니다.
main.py
TERMINALbash — 80x24
> Ready. Click "Run" to execute.
>
QUESTION 1
Which component acts as the 'authoritative gatekeeper' in the ROCm deployment workflow?
The HIP Runtime API
The Support Matrix
The GPG Repository Key
The LLVM Compiler Backend
✅ Correct!
Correct. The Support Matrix defines the compatible intersection of hardware, OS distributions, and kernel versions.❌ Incorrect
The Support Matrix must be verified first to ensure hardware/software compatibility before any API or keys are used.QUESTION 2
What is the primary purpose of 'Repository Bootstrapping'?
To compile the kernel driver from source.
To establish a trusted link to AMD servers via GPG keys and source mapping.
To allocate VRAM for the first time.
To convert CUDA code to HIP code automatically.
✅ Correct!
Yes. Bootstrapping ensures the system can securely pull authentic ROCm binaries and headers.❌ Incorrect
Bootstrapping is about metadata and trust (keys/sources), not compilation or memory allocation.QUESTION 3
Why does the shell usually report 'command not found' for
hipcc immediately after installation?The installation failed silently.
The user lacks permissions to execute the file.
ROCm binaries reside in non-standard versioned directories (e.g., /opt/rocm/bin).
The kernel fusion driver (KFD) is not loaded.
✅ Correct!
Correct. ROCm tools are installed in versioned directories to allow co-existence; the PATH must be manually updated.❌ Incorrect
The issue is visibility. The binaries exist but are not in the system's standard executable path.QUESTION 4
Which system group is required for a user to access GPU device files like
/dev/kfd?admin
render (or video)
amd-drivers
compute-users
✅ Correct!
Correct. The Linux security model restricts direct hardware interaction to members of the 'video' and 'render' groups.❌ Incorrect
Linux uses the standard 'render' or 'video' groups for GPU device access.QUESTION 5
What does the
rocminfo utility verify?Hardware temperature and clock speeds.
The successful handshake between user-space libraries and the kernel driver.
Code syntax errors in HIP applications.
Internet connectivity to AMD's update servers.
✅ Correct!
Yes. rocminfo checks if the HSA (Heterogeneous System Architecture) agents are reachable.❌ Incorrect
Temperature is checked via rocm-smi; rocminfo is for stack health and topology.Case Study: Scaling LLM Training on a Fresh Cluster
Dependency Resolution and Permissions
A DevOps engineer is setting up a new multi-GPU server for LLM training. They have installed the `amdgpu-dkms` package, but the training application fails with `hsa_init() failed`. The engineer notes that the user is not in any special groups and the environment variables are default.
Q
Based on the ROCm Platform Reality, which missing step is likely causing the 'hsa_init() failed' error?
Solution:
The user is likely missing membership in the 'render' or 'video' groups. Even if the driver is correctly installed, the application cannot open the `/dev/kfd` device file without these group permissions.
The user is likely missing membership in the 'render' or 'video' groups. Even if the driver is correctly installed, the application cannot open the `/dev/kfd` device file without these group permissions.
Q
Which command should the engineer run to grant the necessary hardware access to the current user?
Solution:
sudo usermod -aG render,video $USER followed by a full logout and login to refresh the session tokens.Q
If the application still cannot find the HIP compiler, what environmental change is required?
Solution:
The engineer must append the ROCm bin directory to the PATH variable:
The engineer must append the ROCm bin directory to the PATH variable:
export PATH=$PATH:/opt/rocm/bin.